12 research outputs found
Action recognition in depth videos using nonparametric probabilistic graphical models
Action recognition involves automatically labelling videos that contain human motion with action classes. It has applications in diverse areas such as smart surveillance, human computer interaction and content retrieval. The recent advent of depth sensing technology that produces depth image sequences has offered opportunities to solve the challenging action recognition problem. The depth images facilitate robust estimation of a human skeleton’s 3D joint positions and a high level action can be inferred from a sequence of these joint positions.
A natural way to model a sequence of joint positions is to use a graphical model that describes probabilistic dependencies between the observed joint positions and some hidden state variables. A problem with these models is that the number of hidden states must be fixed a priori even though for many applications this number is not known in advance. This thesis proposes nonparametric variants of graphical models with the number of hidden states automatically inferred from data. The inference is performed in a full Bayesian setting by using the Dirichlet Process as a prior over the model’s infinite dimensional parameter space.
This thesis describes three original constructions of nonparametric graphical models that are applied in the classification of actions in depth videos. Firstly, the action classes are represented by a Hidden Markov Model (HMM) with an unbounded number of hidden states. The formulation enables information sharing and discriminative learning of parameters. Secondly, a hierarchical HMM with an unbounded number of actions and poses is used to represent activities. The construction produces a simplified model for activity classification by using logistic regression to capture the relationship between action states and activity labels. Finally, the action classes are modelled by a Hidden Conditional Random Field (HCRF) with the number of intermediate hidden states learned from data. Tractable inference procedures based on Markov Chain Monte Carlo (MCMC) techniques are derived for all these constructions. Experiments with multiple benchmark datasets confirm the efficacy of the proposed approaches for action recognition
Synthetic Text Generation using Hypergraph Representations
Generating synthetic variants of a document is often posed as text-to-text
transformation. We propose an alternate LLM based method that first decomposes
a document into semantic frames and then generates text using this interim
sparse format. The frames are modeled using a hypergraph, which allows
perturbing the frame contents in a principled manner. Specifically, new
hyperedges are mined through topological analysis and complex polyadic
relationships including hierarchy and temporal dynamics are accommodated. We
show that our solution generates documents that are diverse, coherent and vary
in style, sentiment, format, composition and facts
Non-parametric hidden conditional random fields for action classification
Conditional Random Fields (CRF), a structured prediction method, combines probabilistic graphical models and discriminative classification techniques in order to predict class labels in sequence recognition problems. Its extension the Hidden Conditional Random Fields (HCRF) uses hidden state variables in order to capture intermediate structures. The number of hidden states in an HCRF must be specified a priori. This number is often not known in advance. A non-parametric extension to the HCRF, with the number of hidden states automatically inferred from data, is proposed here. This is a significant advantage over the classical HCRF since it avoids ad hoc model selection procedures. Further, the training and inference procedure is fully Bayesian eliminating the over fitting problem associated with frequentist methods. In particular, our construction is based on scale mixtures of Gaussians as priors over the HCRF parameters and makes use of Hierarchical Dirichlet Process (HDP) and Laplace distribution. The proposed inference procedure uses elliptical slice sampling, a Markov Chain Monte Carlo (MCMC) method, in order to sample optimal and sparse posterior HCRF parameters. The above technique is applied for classifying human actions that occur in depth image sequences – a challenging computer vision problem. Experiments with real world video datasets confirm the efficacy of our classification approach
Action classification using a discriminative multilevel HDP-HMM
We classify human actions occurring in depth image sequences using features based on skeletal joint positions. The action classes are represented by a multi-level Hierarchical Dirichlet Process – Hidden Markov Model (HDP-HMM). The non-parametric HDP-HMM allows the inference of hidden states automatically from training data. The model parameters of each class are formulated as transformations from a shared base distribution, thus promoting the use of unlabelled examples during training and borrowing information across action classes. Further, the parameters are learnt in a discriminative way. We use a normalized gamma process representation of HDP and margin based likelihood functions for this purpose. We sample parameters from the complex posterior distribution induced by our discriminative likelihood function using elliptical slice sampling. Experiments with two different datasets show that action class models learnt using our technique produce good classification results
Action recognition in depth videos using nonparametric probabilistic graphical models
Action recognition involves automatically labelling videos that contain human motion with action classes. It has applications in diverse areas such as smart surveillance, human computer interaction and content retrieval. The recent advent of depth sensing technology that produces depth image sequences has offered opportunities to solve the challenging action recognition problem. The depth images facilitate robust estimation of a human skeleton’s 3D joint positions and a high level action can be inferred from a sequence of these joint positions.
A natural way to model a sequence of joint positions is to use a graphical model that describes probabilistic dependencies between the observed joint positions and some hidden state variables. A problem with these models is that the number of hidden states must be fixed a priori even though for many applications this number is not known in advance. This thesis proposes nonparametric variants of graphical models with the number of hidden states automatically inferred from data. The inference is performed in a full Bayesian setting by using the Dirichlet Process as a prior over the model’s infinite dimensional parameter space.
This thesis describes three original constructions of nonparametric graphical models that are applied in the classification of actions in depth videos. Firstly, the action classes are represented by a Hidden Markov Model (HMM) with an unbounded number of hidden states. The formulation enables information sharing and discriminative learning of parameters. Secondly, a hierarchical HMM with an unbounded number of actions and poses is used to represent activities. The construction produces a simplified model for activity classification by using logistic regression to capture the relationship between action states and activity labels. Finally, the action classes are modelled by a Hidden Conditional Random Field (HCRF) with the number of intermediate hidden states learned from data. Tractable inference procedures based on Markov Chain Monte Carlo (MCMC) techniques are derived for all these constructions. Experiments with multiple benchmark datasets confirm the efficacy of the proposed approaches for action recognition
Bayesian Hierarchical Models for Counterfactual Estimation
Counterfactual explanations utilize feature perturbations to analyze the
outcome of an original decision and recommend an actionable recourse. We argue
that it is beneficial to provide several alternative explanations rather than a
single point solution and propose a probabilistic paradigm to estimate a
diverse set of counterfactuals. Specifically, we treat the perturbations as
random variables endowed with prior distribution functions. This allows
sampling multiple counterfactuals from the posterior density, with the added
benefit of incorporating inductive biases, preserving domain specific
constraints and quantifying uncertainty in estimates. More importantly, we
leverage Bayesian hierarchical modeling to share information across different
subgroups of a population, which can both improve robustness and measure
fairness. A gradient based sampler with superior convergence characteristics
efficiently computes the posterior samples. Experiments across several datasets
demonstrate that the counterfactuals estimated using our approach are valid,
sparse, diverse and feasible
Synthetic Document Generator for Annotation-free Layout Recognition
Analyzing the layout of a document to identify headers, sections, tables,
figures etc. is critical to understanding its content. Deep learning based
approaches for detecting the layout structure of document images have been
promising. However, these methods require a large number of annotated examples
during training, which are both expensive and time consuming to obtain. We
describe here a synthetic document generator that automatically produces
realistic documents with labels for spatial positions, extents and categories
of the layout elements. The proposed generative process treats every physical
component of a document as a random variable and models their intrinsic
dependencies using a Bayesian Network graph. Our hierarchical formulation using
stochastic templates allow parameter sharing between documents for retaining
broad themes and yet the distributional characteristics produces visually
unique samples, thereby capturing complex and diverse layouts. We empirically
illustrate that a deep layout detection model trained purely on the synthetic
documents can match the performance of a model that uses real documents
Activity recognition using a supervised non-parametric hierarchical HMM
The problem of classifying human activities occurring in depth image sequences is addressed. The 3D joint positions of a human skeleton and the local depth image pattern around these joint positions define the features. A two level hierarchical Hidden Markov Model (H-HMM), with independent Markov chains for the joint positions and depth image pattern, is used to model the features. The states corresponding to the H-HMM bottom level characterize the granular poses while the top level characterizes the coarser actions associated with the activities. Further, the H-HMM is based on a Hierarchical Dirichlet Process (HDP), and is fully non-parametric with the number of pose and action states inferred automatically from data. This is a significant advantage over classical HMM and its extensions. In order to perform classification, the relationships between the actions and the activity labels are captured using multinomial logistic regression. The proposed inference procedure ensures alignment of actions from activities with similar labels. Our construction enables information sharing, allows incorporation of unlabelled examples and provides a flexible factorized representation to include multiple data channels. Experiments with multiple real world datasets show the efficacy of our classification approach
DocLLM: A layout-aware generative language model for multimodal document understanding
Enterprise documents such as forms, invoices, receipts, reports, contracts,
and other similar records, often carry rich semantics at the intersection of
textual and spatial modalities. The visual cues offered by their complex
layouts play a crucial role in comprehending these documents effectively. In
this paper, we present DocLLM, a lightweight extension to traditional large
language models (LLMs) for reasoning over visual documents, taking into account
both textual semantics and spatial layout. Our model differs from existing
multimodal LLMs by avoiding expensive image encoders and focuses exclusively on
bounding box information to incorporate the spatial layout structure.
Specifically, the cross-alignment between text and spatial modalities is
captured by decomposing the attention mechanism in classical transformers to a
set of disentangled matrices. Furthermore, we devise a pre-training objective
that learns to infill text segments. This approach allows us to address
irregular layouts and heterogeneous content frequently encountered in visual
documents. The pre-trained model is fine-tuned using a large-scale instruction
dataset, covering four core document intelligence tasks. We demonstrate that
our solution outperforms SotA LLMs on 14 out of 16 datasets across all tasks,
and generalizes well to 4 out of 5 previously unseen datasets.Comment: 16 pages, 4 figure
Municipal Bond Pricing: A Data Driven Method
Price evaluations of municipal bonds have traditionally been performed by human experts based on their market knowledge and trading experience. Automated evaluation is an attractive alternative providing the advantage of an objective estimation that is transparent, consistent, and scalable. In this paper, we present a statistical model to automatically estimate U.S municipal bond yields based on trade transactions and study the agreement between human evaluations and machine generated estimates. The model uses piecewise polynomials constructed using basis functions. This provides immense flexibility in capturing the wide dispersion of yields. A novel transfer learning based approach that exploits the latent hierarchical relationship of the bonds is applied to enable robust yield estimation even in the absence of adequate trade data. The Bayesian nature of our model offers a principled framework to account for uncertainty in the estimates. Our inference procedure scales well even for large data sets. We demonstrate the empirical effectiveness of our model by assessing over 100,000 active bonds and find that our estimates are in line with hand priced evaluations for a large number of bonds